An Introduction to Classification and Regression Tree (CART) Analysis
نویسنده
چکیده
Introduction A common goal of many clinical research studies is the development of a reliable clinical decision rule, which can be used to classify new patients into clinically-important categories. Examples of such clinical decision rules include triage rules, whether used in the out-of-hospital setting or in the emergency department, and rules used to classify patients into various risk categories so that appropriate decisions can be made regarding treatment or hospitalization. Traditional statistical methods are cumbersome to use, or of limited utility, in addressing these types of classification problems. There are a number of reasons for these difficulties. First, there are generally many possible " predictor " variables which makes the task of variable selection difficult. Traditional statistical methods are poorly suited for this sort of multiple comparison. Second, the predictor variables are rarely nicely distributed. Many clinical variables are not normally distributed and different groups of patients may have markedly different degrees of variation or variance. Third, complex interactions or patterns may exist in the data. For example, the value of one variable (e.g., age) may substantially affect the importance of another variable (e.g., weight). These types of interactions are generally difficult to model, and virtually impossible to model when the number of interactions and variables becomes substantial. Fourth, the results of traditional methods may be difficult to use. For example, a multivariate logistic regression model yields a probability of disease, which can be calculated using the regression coefficients and the characteristics of the patient, yet such models are rarely utilized in clinical practice. Clinicians generally do not think in terms of probability but, rather in terms of categories, such as " low risk " versus " high risk. " Regardless of the statistical methodology being used, the creation of a clinical decision rule requires a relatively large dataset. For each patient in the dataset, one variable (the dependent variable), records whether or not that patient had the condition which we hope to predic t accurately in future patients. Examples might include significant injury after trauma, myocardial infarction, or subarachnoid hemorrhage in the setting of headache. In addition, other variables record the values of patient characteristics which we believe might help us to predict the value of the dependent variable. For example, if one hopes to predict the presence of subarachnoid hemorrhage, a possible predictor variable might be whether or not the patient's headache was sudden in onset; another possible …
منابع مشابه
Forest Stand Types Classification Using Tree-Based Algorithms and SPOT-HRG Data
Forest types mapping, is one of the most necessary elements in the forest management and silviculture treatments. Traditional methods such as field surveys are almost time-consuming and cost-intensive. Improvements in remote sensing data sources and classification –estimation methods are preparing new opportunities for obtaining more accurate forest biophysical attributes maps. This research co...
متن کاملComparing the Results of Logistic Regression Model and Classification and Regression Tree Analysis in Determining Prognostic Factors for Coronary Artery Disease in Mashhad, Iran
Background and purpose: Understanding of the risk factors for cardiovascular artery disease, which is the leading cause of death worldwide, can lead to essential changes in its etiology, prevalence, and treatment. The aim of this study was to compare the results of logistic regression model and Classification and Regression Tree Analysis (CART) in determining the prognostic factors for coronary...
متن کاملPrediction of melting points of a diverse chemical set using fuzzy regression tree
The classification and regression trees (CART) possess the advantage of being able to handlelarge data sets and yield readily interpretable models. In spite to these advantages, they are alsorecognized as highly unstable classifiers with respect to minor perturbations in the training data.In the other words methods present high variance. Fuzzy logic brings in an improvement in theseaspects due ...
متن کاملFactors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis
Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...
متن کاملاستفاده از مدل ردهبندی درختی برای تعیین عوامل مؤثر بر مرگومیر پس از عمل جراحی کرونری بایپاس در بیماران غیر وابسته به دیالیز
Background and Objective: Coronary artery disease is one of the most prevalent causes of death. A coronary artery bypass surgery is a common treatment for this disease. In addition, renal dysfunction can lead to increased mortality and post-operative complications. This study aimed to identify the most important factors influencing the mortality of patients who suffer from coronary ar...
متن کاملPrediction of potential habitat distribution of Artemisia sieberi Besser using data-driven methods in Poshtkouh rangelands of Yazd province
The present study aimed to model potential habitat distribution of A. sieberi, and its ecological requirements using generalized additive model (GAM) and classification and regression tree (CART) in in the Poshtkouh rangelands of Yazd province. For this purpose, pure habitats of the species was delineated and the species presence data was recorded by the systematic-randomize sampling method. Us...
متن کامل